SYSTEMS AND METHODS 
FOR IMPLEMENTING HASH ALGORITHMS 



This application claims the benefit of Provisional Application 60/223,316 of Jabari Zakiya 
filed August 7, 2000 for METHOD FOR IMPLEMENTING THE SECURE HASH ALGORITHM 
AS A HARDWARE LOGIC GATE, the contents of which are incorporated herein. 

Field of Invention 

This invention relates to the field of data encryption, cryptographic hash algorithms, and 
more particularly to methods and symptoms for implementing cryptographic hash algorithms. 

Background of the Invention 

Hash functions are used to compute a unique condensed representation of a message or a 
data file. An input message of any length <2 L bits is processed to produce a M-bit message digest, 
or the hash, as the output. A cryptographic hash function is considered secure when it is 
computationally infeasible to find a message which corresponds to a given hash value, or to find two 
different messages which produce the same hash. Any change to a message in transit will, with very 
high probability, result in a different hash, causing the signature verification of that message to fail. 

This invention describes a method for implementing the computational core of a hash 
algorithm non-sequentially . It processes an N-bit data block to create a M-bit message digest using 
only combinatorial logic. Thus, this invention describes a method for implementing hash algorithms 
which will create a hash for a block of data in one process (clock) cycle and also produce the hash 
of a Y-block long message in no more than Y process (clock) cycles. 

The current most widely used hash algorithms are MD5 and the Secure Hash Algorithm 
(SHA-1), specified by the National Institute of Standards and Technology (NIST) in FIPS 180-1. 
Newer hashes SHA-256, SHA-384, and SHA-5 12, have also been specified in FIPS 1 80-2 by NIST. 
They differ primarily in the length of the hash value, ranging from 128-5 12 bits. An application of 
this invention's methodology herein will primarily focus on implementing these genetically related 
hashes. However, other hash algorithms, such as the RIPEMD family (also genetically related to 
the above algorithms), can be similarly decomposed into their generic structures and implemented. 

A consequence of this invention's design philosophy causes a tradeoff between hardware 
resources (gates) for clock cycles (time). This enables algorithms to be implemented architecturally 
in the fastest manner possible. This creates many advantages over sequential devices. First, all 
external clocking circuitry is eliminated, making systems easier to design with, which use less parts. 
Thus, physical systems can be made smaller, which use less power and produce less heat, which 
increases their reliability, resulting in significant reductions in total system costs. 

Even more important, this invention enables hash algorithms to meet the performance 
requirements of new Internet broadband rates, cell phones, and other highspeed usages. This will 
become increasingly important as the requirements for authentication, and the use of digital 
signatures, expand to meet the needs of e-commerce, secure financial transactions, secure e-mail, 
and other applications driven by privacy and security concerns. 
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1 Objects of the Invention 

2 

3 It is an object of the present invention to create a method to perform hash algorithms as logic 

4 gate functions using only combinatorial non-sequential logic. 

5 Another object of the invention is to perform hash algorithms architecturally in the fastest 

6 manner. 

7 Still another object of the invention is to create a method to perform hash algorithms which 

8 eliminates the need for external clocking circuitry. 

9 A further object of the invention is to minimize a physical system's complexity and parts 

1 0 counts to perform hash algorithms. 

1 1 Yet another object of the invention is to create the lowest power consuming and heat 

12 dissipating architectures for implementing hash function devices. 

13 Still yet another object of the invention is to maximize a hash system's reliability. 

14 Another object of the invention is to minimize total system costs to perform hashes. 

15 Still a further object of the invention is to allow hash algorithms to be easily configurable 

16 in systems implementing the Digital Signature Standard and other cryptographic protocols. 

17 Still another object of this invention is to produce simple HDL device models which can 
1 83 implement a hash algorithm in FPGA, ASIC, and VLSI designs, using various device technologies. 

.10 

2% Summary of the Invention 

3£i 

23 j It is therefore an object of the present invention to describe methods and systems to perform 

24q hash algorithms as logic functions comprised totally of non-sequential combinatorial logic. This 

25 is achieved through the creation of a non-sequential decomposition of a hash algorithm. This 

263 decomposition produces various embodiments of combinatorial logic elements which are simply 

21§f connected together to perform the algorithm. This enables the creation of an architecture for 

28 s ? performing hash algorithms in an extremely simple and fast manner. 
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1 Brief Description of the Drawings 

2 

3 The objects, features, and advantages of the present invention will be apparent from the 

4 detailed description of the preferred embodiments with references to the following drawings. 
5 

6 FIG. 1 is a block diagram of a generic architecture to perform hash algorithms. 
7 

8 FIG. 2 is a block diagram of the architectural structure for MD5. 
9 

10 FIG. 3 is the generic block structure of the round functions for MD5. 

11 

12 FIG. 4 is a block diagram of the architectural structure for SHA-1 . 
13 

14 FIG. 5 is the generic block structure of the round functions for SHA-1 . 
15 

16 FIG. 6 is a block diagram of the architectural structure for SHA-256/384/5 12. 

li? FIG. 7 is the generic block structure of the round functions for SHA-256/384/512.. 

-ii 

2§1 FIG. 8 lists the renamed nonlinear functions and their round usage. 

2 1 

2Sj FIG. 9 is the generic block structure of the multi-hash round functions for MD5/SHA- 1 . 

m 

245 FIG. 10 is a block diagram of a multi-hash structure to implement both MD5 and SHA- 1 . 
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I Detailed Description 
2 

3 Hash algorithms typically involve two stages of processing. The first stage consists of 

4 creating message blocks of the required length, based on an algorithm's protocols. This includes 

5 performing block padding and inserting the bit count of the message into a block when necessary. 

6 The second stage consists of the hash computation. This invention describes methods and systems 

7 to perform the hash computation stage for hash algorithms. 

8 Fig. 1 is a generic block diagram of a hash algorithm. An N-bit message block Mi 100 is the 

9 input. ForMDSand SHA-l/256,amessageblockis512-bits,whileforSHA-384/512itsl024-bits. 

10 The output hash value 160of a message block consists of the values Ho'-HJ. Full hash values range 

I I from 4 32-bit values (128-bits) for MD5, 5 32-bit values (160-bits) for SHA-1, 8 32-bit values (256- 

12 bits) for SHA-256, 6 64-bit values (384-bits) for SHA-384, and 8 64-bit values(5 12-bits) for SHA- 

13 512. While the hash is used as a contiguous bit value, it is usually produced as separate smaller bit 

14 sized words, typically called chaining values. 

15 A message is hashed in the following manner. A message of any length < 2 L bits (L is 64 

16 or 128 for above hashes) is processed into message blocks of N-bits. Each message block Mi 
%% undergoes some processing, as shown in 105, to produce a message schedule 110, which consists 
f| of the values W 0 -W t .!. For MD5, this processing consists of merely splitting Mi into 16 32-bit 

. l| words, while for the SHA family of hashes it involves more elaborate processing. These Wi are 

2jl inputs into the round functions 140. 

|i The round functions 140 also have as an input the intermediate hash values. Each 140 

M produces new intermediate output hash values for the number of rounds specified by the algorithm. 

11 The initial hash value 120 (Ho-HJ is added at 150 to the last round's hash to produce the final 
2i hash value 160 for the message block Mi. This becomes the new initial hash value 120 for the next 
|S message block or the final hash value after the last block. The initial hash value for the first block 
JI is specified by the hash algorithm. 

|| The round functions 140 perform various arithmetic and logic operations, which may also 

il require the use of specified values other than the intermediate hash values and message schedule 

II values. Also, the internal computational functions and structures will generally not be the same for 

30 each round. The rounds typically range from 64 (MD5 and SHA-256) to 80 (SHA- 1/384/5 12). 

3 1 The block structure of Fig. 1 has been traditionally implemented as a sequential clocked 

32 network, usually requiring at least as many clock cycles as rounds. This invention implements the 

33 structure of Fig. 1 by creating separate instantiations of the round functions and message block 

34 processing elements, which are then simply connected together. 

35 Fig. 2 shows the generic block structure for MD5. It requires 64 rounds consisting of the 

36 four distinct round functions 240-243 (F1-F4), each used for 16 rounds. Message block processing 

37 for MD5 consists of splitting Mi into 16 32-bit words 210 W0-W15. For each 16 round group, a 

38 different permutation of the Wi are inputs into each Fi. The initial hash value 255 (H0-H3) is used 

39 for the first (or only) block of a message, and becomes the first hash when the system is initialized 

40 for each message. The output HASH 260 is the final hash value for each Mi block. 

41 Fig. 3 shows a generic structure for the MD5 round functions 240-243. The input hash is 

42 the 4 32-bit chaining values A-D 301-304 and the output hash is A'-D' 310-313. Each round also 

43 has 32-bit input words Wi 305 and constant value Ki 306. MD5 specifies a different Ki for each 
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1 round. The value S specifies the number of bits of rotation for the 32-bit left rotate operation 330. 

2 For Fl S = (1, 12, 17, 22), for F2 S = (5, 9, 14, 20), for F3 S = (4, 11, 16, 23) and for F4 S = (6, 10, 

3 15,21). These values are used every fourth round within the 1 6 round group for each function. The 

4 nonlinear function 320 is specified as/^Y^) = [X AND Y] OR [~X AND Z] for F1,/ 2 (X,Y,Z) 

5 = [Z AND X] OR [~Z AND Y] for F2 3 / 3 (X, Y,Z) - X XOR Y XOR Z for F3, and / 4 (X, Y,Z) - Y XOR 

6 [~Z OR X] for F4. A round also performs 4 32-bit additions 340-343. 

7 Fig. 4 shows the block structure for SHA-1. It performs 80 rounds using the four round 

8 functions 440-443, which are used for 20 rounds each. The message block Mi is, again, first split 

9 into 16 32-bit words W0-W15, where WO is the beginning of a message block. These Wi are used 

10 to create 64 morel Wi defined as: for t=16 to 79 W t = [(W t „ 3 XOR W rt XOR W M4 XOR W t . 16 )«<l]. 

11 Element 420 is a 4-input 32-bit XOR function, while 425 is 1-bit left rotate operation (which 

12 requires no hard logic to perform) and is the revision to the original SHA specification. The initial 

13 hash value 455 (H0-H4) is used for the first (or only) Mi of a message, and is the first hash when 

14 a system is initialized. The output HASH 460 is the final hash value for each Mi. 

15 Fig. 5 shows the generic round structure for SHA-1. The input hash is the five chaining 

1 6 values A-E 501-505, and the output hash A'-E* 51ft-514, where A is the first (most significant) 32- 
% bit word of the hash value. The 32-bit words Wi 506 and Ki 507 are also inputs. SHA-1 specifies 
Wk only four Ki constants, one for each Fi. It also specifies two fixed 32-bit left rotate operations 530 
ll and 550. The nonlinear function 520 is specified as/,(X,Y,Z) = \X AND Y] OR [~X AND Z] for 
M Fl,^(X,Y,Z)=XXORYXORZ forF2,/ 3 (X,Y,Z) = [XAND Y] OR [X AND Z ] OR [Y AND Z] 
M fbrF3, and^XXZ) =XXOR YXORZ forF4. Four 32-bit additions 540-543 are also performed. 
2ij Fig. 6 shows the generic block structure for SHA-256738/512. SHA-256 has t = 64 rounds, 
2|{ while SHA-384/5 12 has 80. There is now just one generic round function Fl 640. Message block 
i$ processing produces 64 or 80 Wi. Mi is first split, again, into W0-W1 5, where each Wi is 32-bits 
2^ for SHA-256 and 64-bits for SHA-384/512. These Wi are used to create the additional Wi by the 
2f| plurality of expansion elements Wexpand 620. These use functions 625/ and 626^, which have 
M the generic stmctoey/(Wi) =ROTR(Ri) XOR ROTR(Rj) XOR SHR(Rk). The R variables M 
M how many bits input Wi is rotated (»>) or shifted (») right in each instance. For/j the R-tuples 
2? are (Rl, R2, R3) = (3|1, 7, 18|8) for SHA-256|[384/512], and for/ 2 the R-tuples are (R4, R5, R6) - 
3tf (10j6, 19, 17|61). Three 2 b -bit additions 630 are also performed. The Wi are used in ascending order 

31 as inputs into the round functions Fl. The initial hash values 655 are either 32 or 64 bits wide, 

32 depending on the algorithm, and are different for each algorithm. The intermediate hashes are 

33 computed using all 8chaining values A-H,butfor SHA-384-the final hash isjust the first 6 chaining 

34 values A-F, otherwise the algorithms are structurally identical. 

35 The generic block structure for 640 is shown in Fig 7. The inputs are the eight chaining 

36 values A-H 701-708, as well as Wi 709 and Ki 710, while the output is the hash A'-H' 750-757. 

37 Unique Ki constants are specified for each round for each algorithm. The nonlinear functions 720- 

38 723 arey;(X,Y,Z) = [X AND Y] OR [~X AND Z],/ 2 (X, Y,Z) = [X AND Y] XOR [X AND Z] XOR 

39 [Y AND Z],/ 3 (X) = ROTR(S 1) XOR ROTR(S2) XOR ROTR(S3), and / 4 (X) = ROTR(S4) XOR 

40 ROTR(S5)XORROTR(S6). For SHA-256 and [384/5 12], these S-tuples are (S1,S2,S3) = (2|28, 

41 13|34, 22|39) for/ 3 and (S4, S4, S6) = (6|14, 1 1|18, 25(41) for/ 4 . Seven 2 b -bit additions 740-746 are 

42 also performed, where b is either 32 or 64. 
43 
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1 Each of these algorithms can be implemented separately as a physical device by constructing 

2 the necessary round functions, constant values, and message processing elements, and connecting 

3 them as required The methodology of this invention also enables systems which can perform 

4 multiple hash algorithms to be designed with a minimum set of common computational elements. 

5 Thus, for example, systems needing both MD5 and SHA-1 (required for the Digital Signature 

6 Standard) , and/or SHA-256, etc, can be efficiently implemented. This can be accomplished because 

7 these algorithms can be decomposed into a few common computational elements which can be used 

8 to implement them non-sequentially in a cohesive system architecture. 

9 A first step in this process is to identify as many common structures and elements as 

1 0 possible, first at the highest structural level, then down to lower levels. One output of this process 

11 is the recognitions that there are only four distinct nonlinear functions which can be shared between 

1 2 MD5 and SHA- 1 . The functions f x andf 2 for MD5 and f x or SHA- 1 are structurally identical and can 

13 be shared. MD5 f s f 3 and f 2 and f 4 for SHA-1, are also identical. Thus, the four common nonlinear 

14 functions can be renamed to /^(X, Y,Z) = [X AND Y] OR [~X AND Z\ h 2 (X, Y,Z) = X XOR Y XOR 

1 5 Z, h 3 (XJX) = [X AND Y] OR [X AND Z ] OR [ Y AND Z] 5 and A 4 (X, Y,Z) = Y XOR [~Z OR X] . 

16 Fig. 8(a) shows these four renamed nonlinear functions. 

1 7 A next step is to identify for which round these nonlinear functions are used. Fig. 8(b) maps 
lP the use of each h for each algorithm for different round groups. It shows there are 8 distinct round 
1§2 groupings. For Group 1 hi is common to both algorithms, and for Group 4 h2 is common. For 
2t{ rounds65-80(Group8)onlyA2isused,forSHA-l. For round Groups 2 ? 3, 5-7, a switching network 
2 1[| 830 routes the selected output from the nonlinear function pair 820 hi or 825 kj 9 whose inputs are 
22y the correctly routed chaining values B, C, and D, to a round function. In 830 hi and hj represent the 
23j appropriate nonlinear functions for a Group, for MD5 and SHA- 1 . 

240 An additional design partitioning optimization is achieved by removing the (Wi+Ki) 

25 additions from the round functions and performing them instead in the message processing block. 

2|3 Fig. 9 shows a new simplified round function 900 which is used to perform both SHA-1 and MD5. 

2 jK The inputs consists of the chaining values A ? B, and E, hi 906 (the output of 830), and WKi 907, the 

2|1 (Wi+Ki) sum for the round. The current C and D chaining values are merely renamed and routed 

2§K for use in the next round, as shown by900\ The outputs are the new chaining values A'-C 910- 

3.ju 913,thoughB f isjust the renamed A chaining value. A multiplexor 935 selects B or E to be added 

31 at 943. The elements 930, 950, and 960 represent the logic to perform the necessary rotate 

32 operations for each hash. This round function structure (with the rotates hardwired for each hash) 

33 can also produce better delay times when each hash is implemented separately. 

34 Fig. 10 is a generic structure to implement both SHA-1 and MD5 in one system. Message 

35 block processing now performs the additions of Wi and Ki, along with the creation and multiplexing 

36 of the Ki constants. Multiplexor 1015 represents the selection and routing of the Ki constants to the 

37 1018 adders for each hash for the first 64 rounds. The last 16 WKi words use KS4 for SHA-1 . Now 

38 for t total rounds, the WKi 32-bit words 1020 are created and routed to the round functions. Each 

39 Gi 1040 performs the number of rounds shown in 8(b), which are implemented with elements 830 

40 and 900. For each Gi rounds group the appropriate hi functions are used in the 830 elements, and 

4 1 the WGi inputs are the required WKi. The system output, selected by multiplexor 1075, will be the 

42 A-D chaining values from Group 7 for MD5, or the last A-E chaining values from round 80 when 

43 SHA-1 is selected. 
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1 Design and Performance Issues 
2 

3 The "best" decomposition and partitioning of an algorithm for implementing as areal device 

4 will be determined by several parameters. While this invention describes a non-sequential 

5 methodology to make hash devices and systems, which is inherently faster than sequential design 

6 methodology, design optimization tradeoffs will still exist and must be recognized to create the best 

7 structures to implement. Depending on the performance requirements, some design choices will 

8 be better than others for a specific implementing technology and device architecture. 

9 Generally though, reducing the length of the input-to-output critical delay path (cdp) through 

10 a system is a standard design goal. Reducing the cdp through a system minimizes its total 

1 1 propagation delay (tpd), which maximizes its speed. Thus, a design goal for implementing a real 

12 device seeks to make the elements that comprise the cdp to be as physically "small" or "thin" as 

13 possible so they can be placed as close together as possible. Also, another goal is to minimize the 

14 intra-component wire routing requirements. As device technologies produce physically smaller 

15 gates the wiring and routing delays become more dominant, and critical to control 

16 In Fig. 9 the purpose of removing the adder out of the round function was to reduce its size 

1 7 (area), which decreases its cdp length, thus lowering its tpd. This also reduces the input data lines 
f | into each round function, enabling them to be placed physically closer together, which reduces the 
if intra-round routing delay, further reducing the tpd of the entire system. Thus in Fig. 10, the 
|S components that compute the Wi/WKi constant values are all logically grouped in one block. When 
3| building a real device, these components can then be placed and routed separately from the round 
i| function components, which have the highest priority performance routing requirements. 

2| The round functions for these hash algorithms have two critical delay paths : the input hash- 
Si to-output hash path and the Wi (or WKi)-to-output hash path. For the first round function, the initial 
25 hash values are always present before an input block Mi is loaded into the system. Thus, the cdp 
|1 for the first round is the WO/WKO-to-output hash path, because until the propagation delay caused 
3 j by input WQ/WKO through the first round logic stabilizes, the output hash will not become stable. 
i$j Specifically, the A ! chaining value will always take the longest time to stabilize for any round 
2l However, after the first round, the cdp through each round will be the input hash-to-output 
11 hash path, specifically the A-to-A* path. This occurs because after the first round the Wi/WKi 

3 1 values for all the other rounds become stable inputs into those round functions before the input hash 

32 values becomes stable into those rounds. Thus, the propagation path of the input hash through the 

33 round logic, to become a stable output hash value, becomes the cdp. Therefore, a device or system 

34 can be fully characterized for performance by measuring the Mi/WKO-to-last A 1 propagation delay. 

35 The design structure of Fig. 10, then, should be the optimal implementation because it enables 

36 physically smaller and thinner round functions and it reduces the wire routing into the rounds. 

37 It can be seen from Figs. 6 and 7 it is extremely simple to build a device to implement both 

38 SHA-384and512 The structures are identical, requiring only the addition of switching components 

39 to select the correct constants and rotate/shift parameters for each algorithm. 

40 In general, any hash algorithm that can be implemented sequentially can be implemented 

4 1 using the methodology of this invention. This includes a methodology for achieving an "optimum" 

42 implementation of a hash algorithm for specific implementing technologies. This invention also 

43 presents a structured methodology for implementing multi-hash devices and systems. 
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